US COVID Data

Data Prep

In order to prep our data from our EDA we will load, transform the date, remove all zero row from the variable we are testing hospitalizedCurrently, and sort from oldest to newet. This will allow us to easily work through our univariate time series evaluations.

us <- read.csv("https://raw.githubusercontent.com/JaclynCoate/6373_Time_Series/master/TermProject/Data/USdaily.7.26.20.csv", header = T, strip.white = T)
us <- transform(us, date = as.Date(as.character(date), "%Y%m%d"))
us <- subset(us, select = -c(states, dateChecked, hospitalized, lastModified, total, posNeg, totalTestResultsIncrease, hash))
us[is.na(us)] <- 0
#Selecting only those dates with reported current hospitilizations
us <- dplyr::slice(us,1:132)
us = us[order(as.Date(us$date, format = "%Y%m%d")),]
head(us)
##           date positive negative pending hospitalizedCurrently
## 132 2020-03-17    11928    63104    1687                   325
## 131 2020-03-18    15099    84997    2526                   416
## 130 2020-03-19    19770   108407    3016                   617
## 129 2020-03-20    26025   138814    3330                  1042
## 128 2020-03-21    32910   177262    3468                  1436
## 127 2020-03-22    42169   213476    2842                  2155
##     hospitalizedCumulative inIcuCurrently inIcuCumulative
## 132                     55              0               0
## 131                     67              0               0
## 130                     85              0               0
## 129                    108              0               0
## 128                   2020              0               0
## 127                   3023              0               0
##     onVentilatorCurrently onVentilatorCumulative recovered death
## 132                     0                      0         0   122
## 131                     0                      0         0   153
## 130                     0                      0         0   199
## 129                     0                      0         0   267
## 128                     0                      0         0   328
## 127                     0                      0         0   471
##     totalTestResults deathIncrease hospitalizedIncrease negativeIncrease
## 132            75032            22                   13            13707
## 131           100096            31                   12            21893
## 130           128177            46                   18            23410
## 129           164839            68                   23            30407
## 128           210172            61                 1912            38448
## 127           255645           143                 1003            36214
##     positiveIncrease
## 132             3613
## 131             3171
## 130             4671
## 129             6255
## 128             6885
## 127             9259

Stationarity: Current Hospitalizations

It is difficult to assume stationarity for this data due to multiple factors. We are working under the assumption that COVID is a novel virus and cases as well as hospitalizations will eventually return to zero. This being said our current modeling techniques do things such as return to the median or mimic the previously seen trends. Also, we see a heavy wandering trend in both new cases and hospitalization would be dependent on this as well as time. We will review the data and see what, if any, non-stationary components reveal themselves and model the data accordingly.

Univariate AR/ARMA Modeling

  1. Original Realization Analysis Traits:
  • Heavy wandering behavior
  • What appears to be some noise that could be pseudo-cyclic behavior hidden by the large numbers.
ggplot(data = us, aes(x=date, y=hospitalizedCurrently))+
  geom_line(color="orange")+
  labs(title = "Current COVID Hospitalized Cases US", y = "Thousands", x = "") +
    theme_fivethirtyeight()

  1. Sample Realization, ACF, and Spectral Density Realization:
  • Heavy wandering behavior
  • Possible small pseudo-cyclic behavior

ACF:

  • Very slowly dampening behavior that would be consistent with a d=1 ARIMA model.

Spectral Density:

  • Peak at f=0
  • What appears to be a wave through the rest of the graph- this could be a hidden seasonality or another frequency peak that is hidden by the pseudo-cyclic behavior mentioned in above the realization above.
plotts.sample.wge(us$hospitalizedCurrently, lag.max = 100)

## $autplt
##   [1]  1.000000000  0.967247046  0.928472445  0.883791745  0.834039153
##   [6]  0.779901780  0.722099406  0.660670233  0.595507232  0.527652159
##  [11]  0.459798098  0.392707355  0.325070497  0.257394633  0.190076427
##  [16]  0.123107742  0.058045025 -0.005592108 -0.062510518 -0.114255855
##  [21] -0.163281121 -0.206979530 -0.242176807 -0.275097030 -0.300626756
##  [26] -0.323018458 -0.340862401 -0.357234304 -0.371125261 -0.380223777
##  [31] -0.387733922 -0.393904174 -0.399188337 -0.403702529 -0.407535556
##  [36] -0.409058093 -0.406032310 -0.402291057 -0.397307333 -0.392192952
##  [41] -0.385158345 -0.377444480 -0.367999137 -0.357234818 -0.345390268
##  [46] -0.333657101 -0.320405088 -0.307093151 -0.293895002 -0.279566463
##  [51] -0.263105425 -0.246363949 -0.229539058 -0.213137515 -0.196728805
##  [56] -0.180700291 -0.164151991 -0.145439160 -0.126569221 -0.107984741
##  [61] -0.090254314 -0.072242265 -0.054130291 -0.034756123 -0.014095709
##  [66]  0.007187366  0.028500485  0.048915792  0.068729196  0.088785284
##  [71]  0.109636722  0.130889993  0.152506791  0.173725394  0.193391213
##  [76]  0.211419137  0.228441065  0.245143327  0.260841689  0.274513731
##  [81]  0.286671443  0.296525403  0.304596844  0.311148685  0.317183678
##  [86]  0.321808447  0.324325398  0.323899788  0.321013182  0.315477248
##  [91]  0.307881329  0.298203689  0.286925511  0.272658784  0.256479904
##  [96]  0.236600281  0.213678525  0.188829126  0.163809020  0.138368627
## [101]  0.111541632
## 
## $freq
##  [1] 0.007575758 0.015151515 0.022727273 0.030303030 0.037878788
##  [6] 0.045454545 0.053030303 0.060606061 0.068181818 0.075757576
## [11] 0.083333333 0.090909091 0.098484848 0.106060606 0.113636364
## [16] 0.121212121 0.128787879 0.136363636 0.143939394 0.151515152
## [21] 0.159090909 0.166666667 0.174242424 0.181818182 0.189393939
## [26] 0.196969697 0.204545455 0.212121212 0.219696970 0.227272727
## [31] 0.234848485 0.242424242 0.250000000 0.257575758 0.265151515
## [36] 0.272727273 0.280303030 0.287878788 0.295454545 0.303030303
## [41] 0.310606061 0.318181818 0.325757576 0.333333333 0.340909091
## [46] 0.348484848 0.356060606 0.363636364 0.371212121 0.378787879
## [51] 0.386363636 0.393939394 0.401515152 0.409090909 0.416666667
## [56] 0.424242424 0.431818182 0.439393939 0.446969697 0.454545455
## [61] 0.462121212 0.469696970 0.477272727 0.484848485 0.492424242
## [66] 0.500000000
## 
## $db
##  [1]   8.4297671  13.7645189  12.5249640   8.3502117   4.2830812
##  [6]  -0.7743908  -1.6306179  -1.6820655  -1.1397181  -1.9660002
## [11]  -3.8470478  -4.5601791  -5.7979845  -6.6448260  -8.1613031
## [16]  -7.7172776  -7.0289454  -6.7231288 -11.3437841  -7.3292891
## [21]  -8.5570448  -9.8123688 -12.3352898 -12.0470153 -10.4188757
## [26] -11.1989248 -11.5484834 -11.3001319 -11.8583444 -13.4758586
## [31] -13.5752424 -13.1601144 -13.4822098 -11.8751077 -12.0098408
## [36] -11.6652933 -14.1857608 -18.7098443 -15.1452524 -14.2793937
## [41] -13.6312562 -13.4537203 -13.8449147 -14.8421169 -15.8144787
## [46] -16.3497508 -16.1392884 -14.5229548 -13.9096257 -14.2979673
## [51] -15.1762427 -16.9533791 -16.6240685 -15.5004175 -14.8930293
## [56] -15.3404690 -17.3904939 -15.6412620 -14.5937114 -14.5227461
## [61] -16.6299676 -17.9885318 -16.8369446 -17.2106857 -15.1218462
## [66] -13.7373278
## 
## $dbz
##  [1]  10.8000075  10.4225032   9.7877656   8.8879309   7.7133502
##  [6]   6.2551734   4.5113555   2.5000875   0.2870472  -1.9727273
## [11]  -4.0185421  -5.5981361  -6.6776527  -7.4337186  -8.0352652
## [16]  -8.5435576  -8.9672493  -9.3368790  -9.7184801 -10.1759984
## [21] -10.7327082 -11.3585971 -11.9855900 -12.5448786 -13.0053273
## [26] -13.3805586 -13.7009145 -13.9838700 -14.2295004 -14.4361103
## [31] -14.6143036 -14.7849900 -14.9660787 -15.1624073 -15.3669935
## [36] -15.5703444 -15.7685492 -15.9634820 -16.1567949 -16.3451059
## [41] -16.5214784 -16.6812308 -16.8255589 -16.9589464 -17.0832210
## [46] -17.1948352 -17.2887237 -17.3653174 -17.4335644 -17.5062347
## [51] -17.5910578 -17.6848979 -17.7756255 -17.8505813 -17.9054045
## [56] -17.9463455 -17.9845791 -18.0276392 -18.0745244 -18.1172420
## [61] -18.1468027 -18.1590805 -18.1567596 -18.1470189 -18.1375821
## [66] -18.1338177
  1. Overfit tables
  • Since we are seeing heavy wandering behavior, we will use overfit tables to see if we can surface any (1-B) factors that have roots very near the unit circle.

    • Below we are able to clearly see 1: (1-B) factor that has a root nearly on the Unit Circle.
est.ar.wge(us$hospitalizedCurrently,p=6,type='burg')
## 
## Coefficients of Original polynomial:  
## 1.2113 0.0324 -0.0917 -0.0697 0.0013 -0.1010 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.9283B+0.9354B^2    1.0308+-0.0812i      0.9672       0.0125
## 1+0.9678B+0.3408B^2   -1.4200+-0.9583i      0.5838       0.4055
## 1-0.2507B+0.3168B^2    0.3957+-1.7322i      0.5628       0.2143
##   
## 
## $phi
## [1]  1.211269271  0.032440618 -0.091744816 -0.069675720  0.001316006
## [6] -0.100966743
## 
## $res
##   [1]  -292.92920    42.23585   -92.70196  -102.02123  -334.70493
##   [6]  -145.22402  -403.96302    34.46467   -83.49135  1318.62578
##  [11]  1377.16436  -880.96039  -636.85824  -375.11269   230.53004
##  [16]   589.34824  -162.87147   523.10092  2268.40156  -856.96328
##  [21]  1307.41009  4905.29484 -2079.33995  2125.99214 -1561.57046
##  [26]  -286.55877 -2910.62575  -721.73533  2310.48679  -741.05699
##  [31] -1458.33820  -885.25596 -1020.67917  -806.18397  1240.82698
##  [36]  3855.28027  -594.27955  -332.98691 -1561.70454   599.16915
##  [41]  -668.18241   910.28844   838.47931   588.30675  -699.57648
##  [46]   648.28101  -290.27821  -792.40531   664.36263  1721.18828
##  [51]  -247.47143  -655.56939 -1027.72246  -316.65232  -555.31836
##  [56]   670.18033  2208.79087   -46.31492  -741.64210  -711.16638
##  [61]  -345.85017  -802.33939  1055.50075  1019.45748   423.37585
##  [66]  -263.76408  -870.20100  -934.79262  -213.25332   762.11152
##  [71]   755.17733   958.19565  -257.19616 -1107.16022 -1097.48478
##  [76]  -392.31146    25.12350   266.29333  -150.37710    11.46933
##  [81]   -24.36326  -230.98962  -270.72490    81.75412   189.17778
##  [86]  -227.80796 -1100.72037  -289.22665  -380.30212  -216.89601
##  [91]   321.96954   641.36609   239.90315  -394.37919   -45.80287
##  [96]  -932.43620   101.31456   444.74934  1155.39447   225.84784
## [101]   -36.98998  -963.47841    90.12262  -630.67985   649.45247
## [106]  1114.69662   348.88102    78.19229  -682.63199  -511.33045
## [111]   -33.55819   480.50704  1404.41245   468.10554  -172.16756
## [116]  6696.26244 -2026.25073 -1457.43811  -294.49345   416.89154
## [121]  -780.81322   714.03055  -299.92450  -595.44881    59.38061
## [126]   536.60736   999.48047   240.91788   133.31315  -233.45378
## [131]  -301.48621  -282.07861
## 
## $avar
## [1] 1358148
## 
## $aic
## [1] 14.22769
## 
## $aicc
## [1] 15.25171
## 
## $bic
## [1] 14.38057
  1. Difference the data based on surfaced (1-B) Factor
  • Once the data has been differenced, we see something that looks much closer to a stationary data set. However, we have also surfaced what appears to be a small seasonality component. We see the ACF have higher spikes surface at 7 and 14, which would lead us to believe there is a 7-day seasonal component.
us.diff = artrans.wge(us$hospitalizedCurrently, phi.tr = 1, lag.max = 100)

acf(us.diff)

  1. Seasonality Transformation
  • Above we have surfaced what appears to be a 7-day seasonality trend. We will now transform the data for the s=7.
us.diff.seas = artrans.wge(us.diff,phi.tr = c(0,0,0,0,0,0,1), lag.max = 100)

  1. Diagnose Model w/ aic.wge
  • When we diagnose the best models to use for our stationary data set, we see the R AIC5 function selects an AIC ARMA(5,1) model while the BIC selects a AR(2). The AR(2) model is consistent with our pseudo-cyclic data as well as the dampening cyclical sample autocorrelations that are produced by the transformed data. The ARMA(5,1) could also produce these same traits. We will move forward and compare these two models.
aic5.wge(us.diff.seas)
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic
##       p    q        aic
## 17    5    1   14.57449
## 10    3    0   14.60386
## 13    4    0   14.61477
## 6     1    2   14.61527
## 8     2    1   14.61581
aic5.wge(us.diff.seas,type = "bic")
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  bic
##       p    q        bic
## 7     2    0   14.68775
## 10    3    0   14.69484
## 6     1    2   14.70625
## 8     2    1   14.70679
## 3     0    2   14.72173
  1. Diagnose white noise
  • Both of the Junge Box test show us that we reject the H null with p-values that are < 0.05 alpha significance level.
ljung.wge(us.diff.seas)$pval
## Obs 0.2522573 0.4096169 0.2803795 0.1452644 0.1021298 0.1354335 -0.2827722 0.07264095 -0.09804601 -0.01062144 0.004067316 -0.04130786 -0.03854601 -0.02866098 -0.09193388 -0.05167819 -0.1056372 -0.1193326 -0.08827532 -0.1061183 -0.1054081 -0.036613 -0.0654078 -0.03489691
## [1] 1.859792e-06
ljung.wge(us.diff.seas, K=48)$pval
## Obs 0.2522573 0.4096169 0.2803795 0.1452644 0.1021298 0.1354335 -0.2827722 0.07264095 -0.09804601 -0.01062144 0.004067316 -0.04130786 -0.03854601 -0.02866098 -0.09193388 -0.05167819 -0.1056372 -0.1193326 -0.08827532 -0.1061183 -0.1054081 -0.036613 -0.0654078 -0.03489691 0.02370281 -0.02743256 -0.02153882 -0.03584166 -0.1039903 -0.02877203 -0.03843455 -0.07298731 -0.03237146 -0.02495354 0.008256594 0.02396111 -0.06576515 -0.04980251 -0.04736059 -0.04415211 -0.03591561 -0.04413551 -0.03016308 0.03982533 0.008384104 0.02503231 0.03614677 0.01834679
## [1] 0.003990771
  1. Estimate Phis and Thetas
  • AIC Phi and Theta Estimates
est.us.diff.seasAIC = est.arma.wge(us.diff.seas, p = 5, q=1)
## 
## Coefficients of Original polynomial:  
## -0.6097 0.4850 0.5047 0.0643 -0.2269 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1+0.9305B             -1.0747               0.9305       0.5000
## 1+0.9405B+0.5775B^2   -0.8143+-1.0337i      0.7599       0.3562
## 1-1.2613B+0.4223B^2    1.4934+-0.3713i      0.6498       0.0388
##   
## 
mean(us$hospitalizedCurrently)
## [1] 39625.17
  • BIC Phi Estimates
est.us.diff.seasBIC = est.arma.wge(us.diff.seas, p = 2)
## 
## Coefficients of Original polynomial:  
## 0.1594 0.3740 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-0.6964B              1.4359               0.6964       0.0000
## 1+0.5370B             -1.8622               0.5370       0.5000
##   
## 
mean(us$hospitalizedCurrently)
## [1] 39625.17

Univariate ARIMA(5,1,1), s=7 Forecasting

  • 7-Day Forecast
shortARMA <- fore.aruma.wge(us$hospitalizedCurrently, phi = est.us.diff.seasAIC$phi, theta = est.us.diff.seasAIC$theta, d= 1, s = 7, n.ahead = 7, lastn = F, limits = T)

  • AIC
est.us.diff.seasAIC$aic
## [1] 14.57449
  • Windowed ASE: 14,880,281
phis = est.us.diff.seasAIC$phi
thetas = est.us.diff.seasAIC$theta

trainingSize = 24
horizon = 7
ASEHolder = numeric()

for( i in 1:(124-(trainingSize + horizon) + 1))
{
  forecasts = fore.aruma.wge(us$hospitalizedCurrently[i:(i+(trainingSize-1))],phi = phis, theta = thetas, s = 7, d = 1, n.ahead = horizon)
  
  ASE = mean((us$hospitalizedCurrently[(trainingSize+i):(trainingSize+ i + (horizon) - 1)] - forecasts$f)^2)
         
  ASEHolder[i] = ASE
}
invisible(ASEHolder)
hist(ASEHolder)

WindowedASE = mean(ASEHolder)

summary(ASEHolder)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##     39401    754593   2108050  14880281   6998084 176230307
WindowedASE
## [1] 14880281
  • ASE: 73,605,156
fs = fore.aruma.wge(us$hospitalizedCurrently[i:(i+(trainingSize+horizon)-1)],phi = phis, theta = thetas, s = 7, d = 1,n.ahead = 7, lastn = TRUE)

ASE = mean((us$hospitalizedCurrently[(i+trainingSize):(i+(trainingSize+horizon)-1)] - fs$f )^2)
ASE
## [1] 73605156
  • 90-Day Forecast
longARMA <- fore.aruma.wge(us$hospitalizedCurrently, phi = est.us.diff.seasAIC$phi, theta = est.us.diff.seasAIC$theta, d= 1, s = 7, n.ahead = 90, lastn = F, limits = F)

  • Windowed ASE: 18,103,000,000
phis = est.us.diff.seasAIC$phi
thetas = est.us.diff.seasAIC$theta

trainingSize = 24
horizon = 90
ASEHolder = numeric()

for( i in 1:(124-(trainingSize + horizon) + 1))
{
  forecasts = fore.aruma.wge(us$hospitalizedCurrently[i:(i+(trainingSize-1))],phi = phis, theta = thetas, s = 7, d = 1,n.ahead = horizon)
  
  ASE = mean((us$hospitalizedCurrently[(trainingSize+i):(trainingSize+ i + (horizon) - 1)] - forecasts$f)^2)
         
  ASEHolder[i] = ASE
}
invisible(ASEHolder)
hist(ASEHolder)

WindowedASE = mean(ASEHolder)

summary(ASEHolder)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 1.083e+08 6.202e+08 8.410e+09 1.810e+10 2.989e+10 6.151e+10
WindowedASE
## [1] 1.8103e+10
  • ASE: 108,278,159
fs = fore.aruma.wge(us$hospitalizedCurrently[i:(i+(trainingSize+horizon)-1)],phi = phis, theta = thetas, s = 7, d = 1, n.ahead = 90, lastn = T)

ASE = mean((us$hospitalizedCurrently[(i+trainingSize):(i+(trainingSize+horizon)-1)] - fs$f )^2)
ASE
## [1] 108278159

Univariate ARIMA(2,1,0), s=7 Forecasting

  • 7-Day Forecast
shortAR <- fore.aruma.wge(us$hospitalizedCurrently, phi = est.us.diff.seasBIC$phi, d=1, s=7, n.ahead = 7, lastn = FALSE, limits = FALSE)

  • AIC
est.us.diff.seasBIC$aic
## [1] 14.61952
  • Windowed ASE: 15,546,758
phis = est.us.diff.seasBIC$phi
thetas = est.us.diff.seasBIC$theta

trainingSize = 24
horizon = 7
ASEHolder = numeric()

for( i in 1:(124-(trainingSize + horizon) + 1))
{
  forecasts = fore.aruma.wge(us$hospitalizedCurrently[i:(i+(trainingSize-1))],phi = phis, theta = thetas, s = 7, d = 1, n.ahead = horizon)
  
  ASE = mean((us$hospitalizedCurrently[(trainingSize+i):(trainingSize+ i + (horizon) - 1)] - forecasts$f)^2)
         
  ASEHolder[i] = ASE
}
invisible(ASEHolder)
hist(ASEHolder)

WindowedASE = mean(ASEHolder)

summary(ASEHolder)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##     77157    618880   2264284  15546758   6837785 202401118
WindowedASE
## [1] 15546758
  • ASE: 60,809,514
fs = fore.aruma.wge(us$hospitalizedCurrently[i:(i+(trainingSize+horizon)-1)],phi = phis, theta = thetas, s = 7, d = 1,n.ahead = 7, lastn = TRUE)

ASE = mean((us$hospitalizedCurrently[(i+trainingSize):(i+(trainingSize+horizon)-1)] - fs$f )^2)
ASE
## [1] 60809514
  • 90 Day Forecast
longAR <- fore.aruma.wge(us$hospitalizedCurrently, phi = est.us.diff.seasBIC$phi, s= 7, d = 1, n.ahead = 90, lastn = FALSE, limits = FALSE)

  • Windowed ASE: 19,427,666,679
phis = est.us.diff.seasBIC$phi
thetas = est.us.diff.seasBIC$theta

trainingSize = 24
horizon = 90
ASEHolder = numeric()

for( i in 1:(124-(trainingSize + horizon) + 1))
{
  forecasts = fore.aruma.wge(us$hospitalizedCurrently[i:(i+(trainingSize-1))],phi = phis, theta = thetas, s = 7, d = 1,n.ahead = horizon)
  
  ASE = mean((us$hospitalizedCurrently[(trainingSize+i):(trainingSize+ i + (horizon) - 1)] - forecasts$f)^2)
         
  ASEHolder[i] = ASE
}
ASEHolder
##  [1] 61052232487 54487603507 35485946674 27887241909 17356997661
##  [6]  7212077636  8960477292   598137404   130355537   347397993
## [11]   185865366
hist(ASEHolder)

WindowedASE = mean(ASEHolder)

summary(ASEHolder)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 1.304e+08 4.728e+08 8.960e+09 1.943e+10 3.169e+10 6.105e+10
WindowedASE
## [1] 19427666679
  • ASE: 185,865,366
fs = fore.aruma.wge(us$hospitalizedCurrently[i:(i+(trainingSize+horizon)-1)],phi = phis, theta = thetas, s = 7, d = 1,n.ahead = 90, lastn = TRUE)

ASE = mean((us$hospitalizedCurrently[(i+trainingSize):(i+(trainingSize+horizon)-1)] - fs$f )^2)
ASE
## [1] 185865366

Univariate Multilayered Perceptron (MLP) / Neural Network Model

For our univariate NN model we created a training and test data set. This allows us to cross validate our model performance. This is our first NN model and will be used with mostly default parameters. This is to see how our mlp function does in producing a model with few settings. However, with such little data we are also curious how leveraging all of the data changes the trend on the forecast. So, we will model them side by side to see the difference on what the forecasts produce.

  1. Creating train / test data set
library(nnfor)
## Loading required package: forecast
## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Registered S3 methods overwritten by 'forecast':
##   method             from    
##   fitted.fracdiff    fracdiff
##   residuals.fracdiff fracdiff
head(us)
##           date positive negative pending hospitalizedCurrently
## 132 2020-03-17    11928    63104    1687                   325
## 131 2020-03-18    15099    84997    2526                   416
## 130 2020-03-19    19770   108407    3016                   617
## 129 2020-03-20    26025   138814    3330                  1042
## 128 2020-03-21    32910   177262    3468                  1436
## 127 2020-03-22    42169   213476    2842                  2155
##     hospitalizedCumulative inIcuCurrently inIcuCumulative
## 132                     55              0               0
## 131                     67              0               0
## 130                     85              0               0
## 129                    108              0               0
## 128                   2020              0               0
## 127                   3023              0               0
##     onVentilatorCurrently onVentilatorCumulative recovered death
## 132                     0                      0         0   122
## 131                     0                      0         0   153
## 130                     0                      0         0   199
## 129                     0                      0         0   267
## 128                     0                      0         0   328
## 127                     0                      0         0   471
##     totalTestResults deathIncrease hospitalizedIncrease negativeIncrease
## 132            75032            22                   13            13707
## 131           100096            31                   12            21893
## 130           128177            46                   18            23410
## 129           164839            68                   23            30407
## 128           210172            61                 1912            38448
## 127           255645           143                 1003            36214
##     positiveIncrease
## 132             3613
## 131             3171
## 130             4671
## 129             6255
## 128             6885
## 127             9259
usTrain.nn = ts(us$hospitalizedCurrently[1:125])
  1. Fitting NN model
  1. Fitted model on train data set. While we will continue to build on model on the train data set to get our ASE etc. We did want to see how 7 days can change We did this to see just how different merely 7 days can mean to a model.
us.nn.fit = mlp(usTrain.nn, outplot = T, comb = "mean", m=7, reps = 50)

plot(us.nn.fit)

  1. Fitted a model on the full data set. It shows that the same model is developed but we want to see how this affects our forecast. We suspect a data point up or down can drastically change the trend of the forecast.
us.nn.fit2 = mlp(ts(us$hospitalizedCurrently), outplot = T, comb = "mean", m=7, reps = 50)

plot(us.nn.fit2)

  1. Forecast horizon/step forward
  1. With just the trained data set being used we see a slightly trend upward in the 7-day forecast. However, the 90 day forecast is showing a much larger lift in trend towards the end of the forecast. We see a large lift in numbers. We also see the plausible range for the possibilities is also high. The NN models give us a glimpse into how difficult it is to forecast something like COVID hospitalization cases. With such limited data, and the lack of ability to know if we’ve completed a full ‘cycle’ for predictions against leaves us with many possible outcomes and the NN forecast shows us this by the large range of possible outcomes and the mean in blue in the center.
  • 7-Day Forecast
us.nn.fit.fore7 = forecast(us.nn.fit, h=7)
plot(us.nn.fit.fore7)

  • 90-Day Forecast
us.nn.fit.fore90 = forecast(us.nn.fit, h=90)
plot(us.nn.fit.fore90)

  1. For the 7-day forecast we still see an extensive range for the NN models, we do see a change in trend for full data set than the trained. We can see that those extra days of showing a downward trend project instead of a flat level for COVID hospitalized patients. This would be essentially a difference in being able to reduce supplies versus needing supplies to remain the same. This is important to take into account and means a daily update of the model would be needed to accurately forecast any future trend or predictions. For the 90-day forecast we see a similar change in trend. For the 90-day forecast based on the trained data set we expect to see a large increase in trend for the 90-day forecast. This could be useful, however, seems like it is performing directly opposite to what we expect from novel virus.
  • 7 Day Forecast
us.nn.fit.fore2 = forecast(us.nn.fit2, h=7)
plot(us.nn.fit.fore2)

  • 90-Day Forecast
us.nn.fit.fore2 = forecast(us.nn.fit2, h=90)
plot(us.nn.fit.fore2)

  1. Plot forecast against test set
plot(us$hospitalizedCurrently[126:132], type = "l", ylim = c(55000, 80000))
lines(seq(1:7), us.nn.fit.fore7$mean, col = "blue")

  1. ASE: 1,302,298 -7-Day
ASEus.nn.fit.fore7 = mean((us$hospitalizedCurrently[126:132]-us.nn.fit.fore7$mean)^2)
ASEus.nn.fit.fore7
## [1] 1254033

-90-Day

ASEus.nn.fit.fore90 = mean((us$hospitalizedCurrently[43:132]-us.nn.fit.fore90$mean)^2)
ASEus.nn.fit.fore90
## [1] 1501139033

MLP NN Model Analysis

We completed a default neural network model. With so many opportunities for how to actually tune neural network model we knew this would not be our best model in this case. So, we moved forward with a hyper tuned neural network model for our ensemble model that allows us to calculate many windowed ASEs and compare those models against each other.

Ensemble Model / Hyper tuned NN Model

library(tswgewrapped)
  1. Train / Test Data Sets
data_train.u <- data.frame(hospitalizedCurrently = us$hospitalizedCurrently[1:122], positiveIncrease = rnorm(122, 0, .0001))
data_test.u <- data.frame(hospitalizedCurrently = us$hospitalizedCurrently[123:132], positiveIncrease = rnorm(10, 0, .0001))
  1. Hyper tune parameters
  • Here we are running specialty function contained in tswgewrapped package that allows us to perform a grid search that will complete the tuning of all parameters to obtain the one with the lowest windowed ASE.
# search for best NN hyperparameters in given grid
model.u = tswgewrapped::ModelBuildNNforCaret$new(data = data_train.u, var_interest = "hospitalizedCurrently",
                                               search = 'random', tuneLength = 5, parallel = TRUE,
                                               batch_size = 50, h = 7, m = 7,
                                               verbose = 1)
## Registered S3 method overwritten by 'lava':
##   method     from   
##   print.pcor greybox
## Aggregating results
## Selecting tuning parameters
## Fitting reps = 16, hd = 1, allow.det.season = FALSE on full training set
## - Total Time for training: : 45.327 sec elapsed
  1. The windowed ASEs associated with the grid of hyperparameters is shown in the table and heatmap below.
res.u <- model.u$summarize_hyperparam_results()
res.u
##   reps hd allow.det.season     RMSE      ASE   RMSESD    ASESD
## 1   10  2             TRUE 3643.591 28103090 4038.572 63390082
## 2   11  3            FALSE 3650.264 29616976 4233.415 69910750
## 3   16  1            FALSE 2981.220 11743988 1772.554 10147156
## 4   16  3             TRUE 3658.018 29929078 4266.472 70759678
## 5   22  3            FALSE 3666.595 29914495 4256.481 70626149
model.u$plot_hyperparam_results()

  1. Best Parameters shown in below table. The best hyperparameters based on this grid search are 16 repetitions and 1 hidden layer, and allow.det.season = FALSE.
best.u <- model.u$summarize_best_hyperparams()
best.u
##   reps hd allow.det.season
## 3   16  1            FALSE
  1. Windowed ASE of 12,177,586.
final.ase.u <- dplyr::filter(res.u, reps == best.u$reps &
                    hd == best.u$hd &
                    allow.det.season == best.u$allow.det.season)[['ASE']]
final.ase.u
## [1] 11743988
  1. Ensemble model characteristics and plot
# Ensemble / Hypertuned NN Model
caret_model.u = model.u$get_final_models(subset = 'a')
caret_model.u$finalModel
## MLP fit with 1 hidden node and 16 repetitions.
## Series modelled in differences: D1.
## Univariate lags: (1,2,3)
## 1 regressor included.
## - Regressor 1 lags: (3)
## Forecast combined using the median operator.
## MSE: 1467574.9522.
#Plot Final Model
plot(caret_model.u$finalModel)

  1. Train ensemble model
  • Since we came across an interesting outcome with our nn default model above when we reviewed the forecasts on the trained data vs full data set; we will do the same with our hypertuned parameters ensemble model.
  1. First we build our ensemble model on the trained data set.
#Ensemble model trained data
ensemble.mlp.u1 = nnfor::mlp(usTrain.nn, outplot = T, reps = 16, hd = 1, allow.det.season = F)

ensemble.mlp.u1
## MLP fit with 1 hidden node and 16 repetitions.
## Series modelled in differences: D1.
## Univariate lags: (1,2,3)
## Forecast combined using the median operator.
## MSE: 1452364.1472.
  1. Next we build our model on the entirity data set.
#Ensemble model
ensemble.mlp.u2 = nnfor::mlp(ts(us$hospitalizedCurrently), outplot = T, reps = 16, hd = 1, allow.det.season = F)

ensemble.mlp.u2
## MLP fit with 1 hidden node and 16 repetitions.
## Series modelled in differences: D1.
## Univariate lags: (1,2,3)
## Forecast combined using the median operator.
## MSE: 1386442.1448.
  • 7-Day
  1. First we will 7-day forecast our trained data set. As we can see we see a flattening of the forecast over the next 7 days witha slight downward trend..
fore7.1.u = forecast(ensemble.mlp.u1 , h=7)
plot(fore7.1.u)

  1. Next we forecasted the 7-day for our full data set. We see a much strong downward trend in our 7 day forecast.
fore7.2.u = forecast(ensemble.mlp.u2, h=7)
plot(fore7.2.u)

  • 90-Day
  1. With our 90-day trained data set forecast we see a very strong downward trend. With a possibility of a variation of this trend being a little less straight (shown by the shadowed line that is slightly higher than the highlighted blue mean).
fore90.1.u = forecast(ensemble.mlp.u1, h=90)
plot(fore90.1.u)

  1. Full data set. We see a clear downward trend and that the COVID hospitalized cases will eventually reach zero. The possibilities all stay closely to the mean and there doesn’t seem to be much of a chance of a deviation from this model.
fore90.2.u = forecast(ensemble.mlp.u2, h=90)
plot(fore90.2.u)

Univaraite Model Analysis

Upon completion of the above models we can see that the most important take away is that each data point is essential in determining the trend of the COVID virus. We can see that with cross validation methods we can see a trend but as each of those data points become a piece of the model the trend alters day by day. It will be essential moving forward that models are update daily to be able to acquire a good trend and therefore ability to forecast the needs for hospitalizes and the severity of COVID moving forward.

When investigating these models, it became clear that the 90-day forecasts were simply repeating the trend and seasonality without much extrapolation that we would recommend using for long term forecast. We would only recommend using the short 7 day forecast for predicting hospital equipment and staffing needs. The ensemble model had the lowest windowed ASE and is what we recommend moving forward for these short term forecasts.

Univariate Model Performance Breakdown

ARIMA(5,1,1), s=7 Forecasting

  • 7-Day Windowed ASE
    • 14,880,281
  • 90-Day Windowed ASE
    • 18,103,000,000

ARIMA(2,1,0), s=7 Forecasting

  • 7-Day Windowed ASE
    • 15,546,758
  • 90-Day Windowed ASE
    • 19,427,666,679

Default Neural Network Model

  • 7-Day ASE
    • 1,511,391
  • 90-Day ASE
    • 844,457,147

Ensemble Model: Hyper Tuned Neural Network Model

  • 7-Day Windowed ASE
    • 12,177,586